Hierarchical class n-gram language models: towards better estimation of unseen events in speech recognition
نویسندگان
چکیده
In this paper, we show how a multi-level class hierarchy can be used to better estimate the likelihood of an unseen event. In classical backoff n-gram models, the (n-1)-gram model is used to estimate the probability of an unseen n-gram. In the approach we propose, we use a class hierarchy to define an appropriate context which is more general than the unseen n-gram but more specific than the (n-1)-gram. Each node in the hierarchy is a class containing all the words of the descendant nodes (classes). Hence, the closer a node is to the root, the more general the corresponding class is. We also investigate in this paper the impact of the hierarchy depth and the Turing’s discount coefficient on the performance of the model. We evaluate the backoff hierarchical n-gram models on WSJ database with two large vocabularies, 5, 000 and 20, 000 words. Experiments show up to 26% improvement on the unseen events perplexity and up to 12% improvement in the WER when a backoff hierarchical class trigram language model is used on an ASR test set with a relatively large number of unseen events.
منابع مشابه
Efficient estimation of maximum entropy language models with n-gram features: an SRILM extension
We present an extension to the SRILM toolkit for training maximum entropy language models with N -gram features. The extension uses a hierarchical parameter estimation procedure [1] for making the training time and memory consumption feasible for moderately large training data (hundreds of millions of words). Experiments on two speech recognition tasks indicate that the models trained with our ...
متن کاملDecoding with shrinkage-based language models
In this paper, we investigate the use of a class-based exponential language model when directly integrated into speech recognition or machine translation decoders. Recently, a novel class-based language model, Model M, was introduced and was shown to outperform regular n-gram models on moderate amounts of Wall Street Journal data. This model was motivated by the observation that shrinking the s...
متن کاملMorpheme level hierarchical pitman-yor class-based language models for LVCSR of morphologically rich languages
Performing large vocabulary continuous speech recognition (LVCSR) for morphologically rich languages is considered a challenging task. The morphological richness of such languages leads to high out-of-vocabulary (OOV) rates and poor language model (LM) probabilities. In this case, the use of morphemes has been shown to increase the lexical coverage and lower the LM perplexity. Another approach ...
متن کاملTowards a Unified Framework
Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational interface, speech recognition as the front-end of speech understanding remains to be one of the fundam...
متن کاملTowards a unified framework for sub-lexical and supra-lexical linguistic modeling
Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational interface, speech recognition as the front-end of speech understanding remains to be one of the fundam...
متن کامل